NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization

Gong, Xiaochuan; Hao, Jie; Liu, Mingrui (October 2025, Neural Information Processing Systems)

Free, publicly-accessible full text available October 28, 2026
Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

Crawshaw, Michael; Woodworth, Blake; Liu, Mingrui (July 2025, International Conference on Machine Learning)

Free, publicly-accessible full text available July 23, 2026
Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

Crawshaw, Michael; Liu, Mingrui (March 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available March 1, 2026
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression

Crawshaw, Michael; Woodworth, Blake; Liu, Mingrui (March 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available March 1, 2026
Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis

https://doi.org/10.52202/079017-0265

Crawshaw, Michael; Liu, Mingrui (January 2025, Neural Information Processing Systems Foundation, Inc. (NeurIPS))

Full Text Available
An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

https://doi.org/10.52202/079017-2486

Gong, Xiaochuan; Hao, Jie; Liu, Mingrui (January 2025, Neural Information Processing Systems Foundation, Inc. (NeurIPS))

Full Text Available
Will bilevel optimizers benefit from loops

Ji, Kaiyi; Liu, Mingrui; Liang, Yingbin; Ying, Lei. (December 2022, Proc. Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Robustness to Unbounded Smoothness of Generalized SignSGD

Crawshaw, Michael; Liu, Mingrui; Orabona, Francesco; Zhang, Wei; Zhuang, Zhenxun (November 2022, Advances in neural information processing systems)
Oh, Alice H.; Agarwal, Alekh; Belgrave, Danielle; Cho, Kyunghyun (Ed.)
Traditional analyses in non-convex optimization typically rely on the smoothness assumption, namely requiring the gradients to be Lipschitz. However, recent evidence shows that this smoothness condition does not capture the properties of some deep learning objective functions, including the ones involving Recurrent Neural Networks and LSTMs. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this relaxed assumption, it has been theoretically and empirically shown that the gradient-clipped SGD has an advantage over the vanilla one. In this paper, we show that clipping is not indispensable for Adam-type algorithms in tackling such scenarios: we theoretically prove that a generalized SignSGD algorithm can obtain similar convergence rates as SGD with clipping but does not need explicit clipping at all. This family of algorithms on one end recovers SignSGD and on the other end closely resembles the popular Adam algorithm. Our analysis underlines the critical role that momentum plays in analyzing SignSGD-type and Adam-type algorithms: it not only reduces the effects of noise, thus removing the need for large mini-batch in previous analyses of SignSGD-type algorithms, but it also substantially reduces the effects of unbounded smoothness and gradient norms. To the best of our knowledge, this work is the first one showing the benefit of Adam-type algorithms compared with non-adaptive gradient algorithms such as gradient descent in the unbounded smoothness setting. We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating others.
more » « less
Full Text Available
Understanding AdamW through Proximal Methods and Scale-Freeness

Zhuang, Zhenxun; Liu, Mingrui; Cutkosky, Ashok; Orabona, Francesco (August 2022, Transactions on machine learning research)

Full Text Available
On the Initialization for Convex-Concave Min-max Problems

Liu, Mingrui; Orabona, Francesco (January 2022, International Conference on Algorithmic Learning Theory)
Dasgupta, Sanjoy; Haghtalab, Nika (Ed.)
Convex-concave min-max problems are ubiquitous in machine learning, and people usually utilize first-order methods (e.g., gradient descent ascent) to find the optimal solution. One feature which separates convex-concave min-max problems from convex minimization problems is that the best known convergence rates for min-max problems have an explicit dependence on the size of the domain, rather than on the distance between initial point and the optimal solution. This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization. Here, we show that strict-convexity-strict-concavity is sufficient to get the convergence rate to depend on the initialization. We also show how different algorithms can asymptotically achieve initialization-dependent convergence rates on this class of functions. Furthermore, we show that the so-called “parameter-free” algorithms allow to achieve improved initialization-dependent asymptotic rates without any learning rate to tune. In addition, we utilize this particular parameter-free algorithm as a subroutine to design a new algorithm, which achieves a novel non-asymptotic fast rate for strictly-convex-strictly-concave min-max problems with a growth condition and Hölder continuous solution mapping. Experiments are conducted to verify our theoretical findings and demonstrate the effectiveness of the proposed algorithms.
more » « less
Full Text Available

« Prev Next »

Search for: All records